🧩 Visualization - Modular Functions for Impact Analysis Part II¶

Modular Functions for Impact Analysis & Visualization

This section provides reusable, parameterized functions for analyzing and visualizing performance metrics across temporal and categorical dimensions. Designed for flexibility and clarity, the functions support:

  • Dynamic grouping by time (year, month_name, day_name) or category (store, promo, etc.)

  • Preprocessing filters to exclude non-operational records (e.g., closed stores, zero-sales days)

  • Statistical summaries including mean, standard deviation, and count

  • Ranked insights with volatility and performance differentials

  • Interactive visualizations via Plotly for enhanced interpretability

These tools enable scalable impact assessments and trend analyses across diverse datasets with minimal code repetition.

1. Setup & Imports Libraries¶


In [1]:
import time 
from datetime import datetime
In [2]:
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1)  # Simulate processing time
Step 1: Setup and Import Libraries started...
In [3]:
# Data Manipulation & Processing
import os
import sys
import math
import numpy as np
import pandas as pd

# Warnings
import warnings
warnings.simplefilter('ignore')

🧩 Import Modular Functions¶

In [4]:
# Add the main project directory to path (go up 2 levels)
project_root = os.path.abspath('../../')
if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Now import from scripts (since scripts/ has __init__.py, treat it as a package)
from scripts.viz_top10_stores import analyze_top_performers
from scripts.viz_temporal_trends import analyze_temporal_trends
from scripts.viz_holiday_impact import analyze_stateholiday_impact
from scripts.viz_promo_impact import analyze_promotion_impact
In [5]:
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries and modules imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================
Rossman Store Sales Time Series Analysis - Part 2
============================================================
All libraries and modules imported successfully!
Analysis Date: 2025-08-16 01:00:53
In [6]:
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.

In [7]:
# Start Impact Analysis

viz_impact_analysis_begin = pd.Timestamp.now()

bold_start = '\033[1m'
bold_end = '\033[0m'

print("🔍 Viz impact Analysis Started ...")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Viz impact Analysis Started ...
🟢 Begin Date: 2025-08-16 01:00:53

Restore the viz_dataset & import modular functions¶


In [8]:
%store -r df_viz_feat

Top 5 performing days¶

In [9]:
# Run the analysis
analyze_top_performers(df_viz_feat, 'day', 'sales', 5)
Top 5 Day Performance Analysis:
=======================================================
Rank Day        Average         % of #1   
-------------------------------------------------------
1    Mon        €    8,217      100.0%
2    Sun        €    8,205       99.8%
3    Tue        €    7,091       86.3%
4    Fri        €    7,066       86.0%
5    Thu        €    6,756       82.2%

Summary Statistics:
Total days analyzed: 7
Top 5 average: €7,467
Overall average: €7,134
Top 5 outperform by: 4.7%
Out[9]:
day
Mon    8217.443946
Sun    8204.634815
Tue    7090.987556
Fri    7066.366868
Thu    6756.031605
Name: sales, dtype: float64

Yearly sales trends¶

In [10]:
# Run the analysis
analyze_temporal_trends(df_viz_feat, 'year', 'sales')
Year Performance Analysis:
==================================================
Rank Year         Average         Std Dev    Count   
------------------------------------------------------------
3    2015.0       €    7,098     € 3,051   165,841
2    2014.0       €    7,026     € 3,129   310,385
1    2013.0       €    6,815     € 3,115   337,924

Key Insights:
Best year: 2015.0 (€7,098)
Worst year: 2013.0 (€6,815)
Performance range: €283
Volatility: 4.1%
Out[10]:
year avg std count
2 2015 7098.0 3051.0 165841
1 2014 7026.0 3129.0 310385
0 2013 6815.0 3115.0 337924

Promotion impact on customers by month¶

In [11]:
# Run the analysis
analyze_promotion_impact(df_viz_feat, 'customers', 'month')
Promotion Impact Analysis - Customers by Month:
============================================================
Apr         : No Promo    698 | Promo    863 | Lift +23.7%
Aug         : No Promo    684 | Promo    838 | Lift +22.6%
Dec         : No Promo    810 | Promo    998 | Lift +23.2%
Feb         : No Promo    679 | Promo    805 | Lift +18.5%
Jan         : No Promo    661 | Promo    799 | Lift +20.9%
Jul         : No Promo    675 | Promo    855 | Lift +26.7%
Jun         : No Promo    685 | Promo    858 | Lift +25.3%
Mar         : No Promo    681 | Promo    845 | Lift +24.1%
May         : No Promo    728 | Promo    842 | Lift +15.6%
Nov         : No Promo    723 | Promo    845 | Lift +17.0%
Oct         : No Promo    705 | Promo    819 | Lift +16.2%
Sep         : No Promo    679 | Promo    835 | Lift +22.9%

Overall Impact:
Average lift from promotions: +21.3%
Additional revenue per day: 148
Out[11]:
month promo customers
0 Apr No Promo 697.811024
1 Apr Promo 862.927496
2 Aug No Promo 683.697851
3 Aug Promo 838.389713
4 Dec No Promo 810.091292
5 Dec Promo 998.239317
6 Feb No Promo 679.252606
7 Feb Promo 805.110211
8 Jan No Promo 661.389975
9 Jan Promo 799.341393
10 Jul No Promo 675.142099
11 Jul Promo 855.104890
12 Jun No Promo 685.067610
13 Jun Promo 858.470969
14 Mar No Promo 680.716898
15 Mar Promo 844.620730
16 May No Promo 728.140178
17 May Promo 841.680937
18 Nov No Promo 722.733964
19 Nov Promo 845.477052
20 Oct No Promo 704.588344
21 Oct Promo 818.751183
22 Sep No Promo 679.458912
23 Sep Promo 834.975067

Holiday impact on customers by month¶

In [12]:
# Run the analysis
analyze_stateholiday_impact(df_viz_feat, 'customers', 'month')
State Holiday Impact Analysis - Customers by Month:
======================================================================
Overall Holiday Impact:
-------------------------
Public         :  1,279 (+67.4% vs regular)
Easter         :  1,687 (+120.8% vs regular)
Christmas      :  1,569 (+105.4% vs regular)
Regular Days: €   764 (baseline)

Holiday Impact by Month:
-----------------------------------
Apr         : Regular    774 | Holiday    777 | Impact  +0.5%
Aug         : Regular    750 | Holiday    755 | Impact  +0.6%
Dec         : Regular    886 | Holiday  1,569 | Impact +77.2%
Feb         : Regular    730 | Holiday    743 | Impact  +1.8%
Jan         : Regular    722 | Holiday  1,249 | Impact +73.0%
Jul         : Regular    761 | No holiday data
Jun         : Regular    758 | Holiday  1,096 | Impact +44.5%
Mar         : Regular    768 | Holiday    742 | Impact  -3.4%
May         : Regular    777 | Holiday  1,418 | Impact +82.6%
Nov         : Regular    782 | Holiday  2,578 | Impact +229.5%
Oct         : Regular    752 | Holiday  1,212 | Impact +61.3%
Sep         : Regular    746 | No holiday data

Store Operations Impact:
-------------------------
Total store closures: 168,492 (17.1%)
Holiday closures: 39,994
Regular closures: 128,498
Out[12]:
month stateholiday customers
0 Apr Easter 1618.790698
1 Apr Normal Day 773.629811
2 Aug Normal Day 750.410150
3 Aug Public 754.884615
4 Dec Christmas 1569.225352
5 Dec Normal Day 885.667303
6 Feb Normal Day 729.541412
7 Jan Normal Day 722.209148
8 Jan Public 1249.491803
9 Jul Normal Day 761.380498
10 Jun Normal Day 758.481370
11 Jun Public 1095.966019
12 Mar Easter 2235.937500
13 Mar Normal Day 768.046634
14 May Normal Day 776.910586
15 May Public 1418.377483
16 Nov Normal Day 782.217463
17 Nov Public 2577.615385
18 Oct Normal Day 751.844338
19 Oct Public 1212.465116
20 Sep Normal Day 745.742245
In [13]:
print("✅ Data Visualization Impact Analysis completed.\n")
✅ Data Visualization Impact Analysis completed.

In [14]:
print("✅ Features Engineering and Data Visualization (I) completed successfully!")
print(f"🗓️ Analysis Date: {bold_start}{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
✅ Features Engineering and Data Visualization (I) completed successfully!
🗓️ Analysis Date: 2025-08-16 01:01:01

🌟 Advantages¶

  • Reusable across 'year', 'month', 'dayofweek', etc.

  • Easy to change aggregation type ('sum', 'median', etc.)

  • Consistent naming and sorting

  • Makes your code far more modular for dashboards or reporting

Why Reusability Matters¶


  • 💡 Scalability: You can plug your functions into larger pipelines or production environments without rewrites.
  • 🛠️ Maintainability: A bug fix in one utility can instantly improve multiple workflows.
  • 🚀 Efficiency: Spend less time rewriting logic and more time interpreting results.

Why This Matters for Rossmann Store Sales¶

  • We’ll likely repeat the same aggregations or visualizations across hundreds of stores.
  • Promos, holidays, and weekday patterns demand consistent filtering and analysis.
  • Modular functions help you prototype insights fast, scale across stores, and iterate smoothly.
In [15]:
# End analysis
viz_impact_analysis_end = pd.Timestamp.now()
duration = viz_impact_analysis_end - viz_impact_analysis_begin

# Final summary print
print("\n📋 Features Engineering && Data Viz Summary")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"✅ End Date:   {bold_start}{viz_impact_analysis_end.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"⏱️ Duration:   {bold_start}{str(duration)}{bold_end}")
📋 Features Engineering && Data Viz Summary
🟢 Begin Date: 2025-08-16 01:00:53
✅ End Date:   2025-08-16 01:01:01
⏱️ Duration:   0 days 00:00:07.717127

Project Design Rationale: Notebook Separation¶

To promote clarity, maintainability, and scalability within the project, data engineering and visualization tasks are intentionally separated into distinct notebooks. This modular approach prevents the accumulation of excessive code in a single notebook, making it easier to debug, update, and collaborate across different stages of the workflow. By isolating data transformation logic from visual analysis, each notebook remains focused and purpose-driven, ultimately enhancing the overall efficiency and readability of the project.